A Data Driven Approach to Audiovisual Speech Mapping
نویسندگان
چکیده
The concept of using visual information as part of audio speech processing has been of significant recent interest. This paper presents a data driven approach that considers estimating audio speech acoustics using only temporal visual information without considering linguistic features such as phonemes and visemes. Audio (log filterbank) and visual (2D-DCT) features are extracted, and various configurations of MLP and datasets are used to identify optimal results, showing that given a sequence of prior visual frames an equivalent reasonably accurate audio frame estimation can be mapped.
منابع مشابه
Automatic Viseme Clustering for Audiovisual Speech Synthesis
A common approach in visual speech synthesis is the use of visemes as atomic units of speech. In this paper, phonemebased and viseme-based audiovisual speech synthesis techniques are compared in order to explore the balancing between data availability and an improved audiovisual coherence for synthesis optimization. A technique for automatic viseme clustering is described and it is compared to ...
متن کاملThe Cortical Representation of the Speech Envelope 1 is Earlier for Audiovisual Speech than Audio Speech 2 3 Running Head : Earlier Representation of Continuous Audiovisual
36 Visual speech can greatly enhance a listener's comprehension of auditory speech when they 37 are presented simultaneously. Efforts to determine the neural underpinnings of this 38 phenomenon have been hampered by the limited temporal resolution of hemodynamic 39 imaging and the fact that electro-and magnetoencephalographic (EEG/MEG) data are usually 40 analyzed in response to simple, discret...
متن کاملThe cortical representation of the speech envelope is earlier for audiovisual speech than audio speech.
Visual speech can greatly enhance a listener's comprehension of auditory speech when they are presented simultaneously. Efforts to determine the neural underpinnings of this phenomenon have been hampered by the limited temporal resolution of hemodynamic imaging and the fact that EEG and magnetoencephalographic data are usually analyzed in response to simple, discrete stimuli. Recent research ha...
متن کاملA clustering approach for mineral potential mapping: A deposit-scale porphyry copper exploration targeting
This work describes a knowledge-guided clustering approach for mineral potential mapping (MPM), by which the optimum number of clusters is derived form a knowledge-driven methodology through a concentration-area (C-A) multifractal analysis. To implement the proposed approach, a case study at the North Narbaghi region in the Saveh, Markazi province of Iran, was investigated to discover porphyry ...
متن کاملFace Synthesis Driven by Audio Speech Input Based on Hmms
In this paper, a HMM-based visual speech system driven by audio speech input is designed to render a face model while synchronous audio is played. Compared to many methods adopted by other researchers, there is much difference between our approach and theirs. We first train the models for every final and initial in mandarin. In this process, a large quantity of audio training data under differe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016